Asymmetries in the Shanks— Renyi 
Prime Number Race 

Greg Martin 

Abstract. It has been well-observed that an inequality of the type ir(x; q, a) > 
tt(x; q, b) is more likely to hold if a is a non-square modulo q and & is a square 
modulo q (the so-called "Chebyshev Bias"). For instance, each of 7r(cc;8, 3), 
7r(a:;8, 5), and ir(x; 8, 7) tends to be somewhat larger than 7r(x; 8, 1). How- 
ever, it has come to light that the tendencies of these three ir(x; 8, a) to dom- 
inate tt(x; 8, 1) have different strengths. A related phenomenon is that the 
six possible inequalities of the form 7r(a;;8,ai) > 7r(x; 8, 0,2) > 7r(a;;8, 03) with 
{ai, et2> a 3} = {3, 5, 7} are not all equally likely — some orderings are preferred 
over others. In this paper we discuss these phenomena, focusing on the moduli 
q — 8 and q = 12, and we explain why the observed asymmetries (as opposed 
to other possible asymmetries) occur. 



1. Background 

Let tt(x; q, a) denote the number of primes not exceeding x that are congruent 
to a modulo q. We know from the prime number theorem in arithmetic progres- 
sions that the two counting functions 7r(x; q, a) and 7r(x; q, b) are asymptotically 
equal as x tends to infinity (as long as a and b are both coprime to q) . However, 
more complicated behavior emerges when we compare these counting functions 
for finite values of x. Imagine ir(x; q, a) and n(x;q,b) as representing the two 
contestants in a race; as the primes are listed in order, the contestant ir(x; q, a) 
takes a step each time a prime is congruent to a mod q, and similarly for 7r(x; q, b). 
How often is the first contestant ahead of the second? This "race game" is easily 
extended to include several contestants. 

As a prime example (!), consider the two contestants n(x; 4, 1) and n(x; 4, 3). 
(We won't pay any attention to the other contestant ir(x; 4, 2), who, while quick 
out of the starting blocks, is rather lacking in endurance.) Chebyshev was the 
first to note that there are "many more" primes congruent to 3 (mod 4) than to 1 
(mod 4). Indeed, the first value of x for which 7r(x; 4, 1) > n(x; 4, 3) is x = 26,861 
(Leech 0]). Even this victory for ir(x;4, 1) is short-lived, as 26,861 is the first 
of a pair of twin primes, and so n(x; 4, 3) catches right back up and does not 
relinquish the lead again until x = 616,481. 
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Similar biases are observed in race games to other moduli, especially to small 
moduli. For example, n(x; 3, 1) does not exceed 7i(x; 3, 2) for the first time until 
x = 608,981,813,029 (Bays and Hudson We can also compare the four 

counting functions 7r(x; 8, a) for a G {1,3,5,7}. By the time x equals 271, each 
of 7r(x;8,3), 7r(x;8,5), and tt(x; 8, 7) has been in first place; but 7r(x; 8, 1) does 
not even obtain undisputed possession of third place in this four-way race until 
x = 588,067,889 (Bays and Hudson @). 

All of these biases just mentioned are instances of a universal tendency for 
contestants 7r(x; q, a) where a is not a square modulo q to run ahead of contestants 
7r(x; q, b) where b is a square modulo q. We briefly indicate why this is the case 
through an analytic argument (though see Hudson |3| for a different approach), 
at the same time establishing some of the notation to be used throughout this 
paper. We will always assume that the modulus q is fixed, and therefore we will 
not care about the dependence of implicit O-constants on q. 

Let ip(x] q, a) have its usual meaning, 

ip(x;q,a) = A(n) = logp, 

n<x p r <x 
n=a (mod q) p r =a (mod q) 

and set ip(x) = ip(x; 1, 1). Under the Generalized Riemann Hypothesis (GRH), 
the explicit formula from the proof of the prime number theorem for arithmetic 
progressions (see pjj) gives 

Xt^XO 

as long as a is coprime to q. Here the inner sum is indexed by the imaginary 
parts 7 of the nontrivial zeros of the Dirichlet L-function corresponding to the 
character x, and should be interpreted as 

lim y 4 (2) 

so that it will converge. 

We isolate the contribution to ip(x; q, a) from the primes themselves, defining 

9(x;q,a)= lo SP' 

p=a (mod q) 

so that 

6(x; q, a) = ip(x; q, a) - J2 0(x 1/2 ;q,b) 

b 2 =a (mod q) 

- £ 9{x l ' z - A ,c)-.... (3) 

c 3 =a (mod q) 

Notice that the number of terms in the first sum on the right-hand side of this 
equation equals the number of square roots modulo q possessed by a (in particular, 
the sum is empty if a is a non-square modulo q). Let us define 

c(q,a) = -1 + #{1 < b < q: b 2 = a mod q}, (4) 
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so that c(q, a) is an extension of the Legendre symbol (-) to all moduli q (though 
not an extension with nice multiplicativity properties); then the number of terms 
in the first sum on the right-hand side of equation (3) is exactly c(q, a) + 1. 

Invoking the prime number theorem for arithmetic progressions, we have 
9(y; q, a) = y/(j)(q) + 0(y/y\og 2 y) for a fixed modulus q (assuming GRH). Using 
this fact together with the explicit formula (|TJ), equation (3) becomes 



6(x; q, a) = — - - —— ]T x(a) £ - 



X (mod q) 
Xt^XO 

-(c(g,a) + l)^| + 0(x 1 /3). ( 5 ) 

In particular, we have 6{x) = 9(x; 1, 1) = if)(x) — \fx + 0(x 1 ^ 3 ). 

Converting equation (5) to a formula for ir(x;q, a) involves only a straight- 
forward partial summation argument. We phrase the final result in terms of a 
normalized error term for ir(x; q, a), namely 

E(x; q, a) = (0(g)vr(x; q, a) - tt(x)) . (6) 



From equation (5) applied to both 9(x; q, a) and 9(x), one can derive [§, Lemma 
2.1] 

E(x; q, a) = -c(q, a) - x(a)E(x, x) + °(f— ) ' ( 7 ) 



dog X ' 

X (mod q) 
Xt^XO 



where we have defined 



/2 + z 7 



(interpreted similarly to (^|) to ensure its conditional convergence). The behavior 
of E(x\ q, a) is therefore that of a function oscillating in a roughly bounded fashion 
about the mean value —c(q, a), which is positive if a is a non-square (mod q) and 
negative if a is a square (mod q). These two different possible mean values are 
the source of the bias towards nonsquare contestants. 

Rubinstein and Sarnak || have quantified these biases. Define 5 g;ai( ... j0r to be 
the logarithmic density of the set of real numbers x such that the inequalities 

7r(x; q, ax) > 7r(x; q,a 2 ) > ■ ■ ■ > tt(x; q, a r ) 

hold, where the logarithmic density of a set S is 

1 r dt 
lim / — 9 

x^co \ Q g x J[ 2 ,x]nS t 

assuming the limit exists. Let us assume not only GRH, but also that the non- 
negative imaginary parts of the nontrivial zeros of all Dirichlet L-functions are 
linearly independent over the rationals, a hypothesis we shall abbreviate LI. Un- 
der these assumptions, Rubinstein and Sarnak proved that S q;ai; _ )Ch , always exists 
and is strictly positive. They also proved that Sg-^b > \ if and only if a is a 
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nonsquare (mod q) and b is a square (mod q), and calculated several of these 
densities; for example, fi^j = 0.9959 . . . and 63-2,1 = 0.9990 .... 

In joint work with Feuerverger Q5], we calculated many other densities (under 
the hypotheses GRH and LI). One of our discoveries was that the numerical values 
of the densities can vary even when the modulus q is fixed. For example, modulo 
8 the only square is 1 while the three nonsquares are 3, 5, and 7, and modulo 12 
the only square is 1 while the three nonsquares are 5, 7, and 11; we calculated 
that 

5 8 . 31 = 0.999569 5 12 -ii,i = 0.999977 

$8:7,1 = 0.998938 and $i 2; 5,i = 0.999206 (10) 

S 8 . 51 = 0.997395 5 12;7jl = 0.998606. 

We also found that for race games involving more than two contestants, cer- 
tain orderings of the contestants are more likely than others even if the residue 
classes involved are all squares or all nonsquares (mod q), a situation that was 
foreshadowed in [|8|. For example, we calculated that 

$8;3,5,7 = <W,5 i3 = 0.192801 <5 12;5 , 7 ,ii = <5 12;11 ,7, 5 = 0.198452 

$8;3,7,5 = $8;5,7,3 = 0.166426 and <$12;7,5,H = $12;11,5,7 = 0.179985 

$8;5,3,7 = <W,3,5 = 0.140772 <5 12;5 ,11,7 = <W,n,5 = 0.121563. (11) 

The goal of this paper is to begin to understand these recently discovered types 
of asymmetries. We will focus on the densities listed in equations (|T0|) and (LTD, 



explaining how we could have predicted that Ss-,5,1 would be smaller than both 
<5 8; 3,i and <5s;7,i, for example. The hope is that a very concrete inspection of these 
special cases will function as a starting point for a future analysis of the general 
case. 

2. Error terms and random variables 

The great utility of the hypothesis LI, concerning the linear independence of 
the nonnegative imaginary parts of the nontrivial zeros of Dirichlet L-functions, 
is in facilitating calculations involving those zeros that are based on harmonic 
analysis. In this paper we will phrase these calculations in terms of random 
variables, focusing on underscoring the ideas involved rather than belaboring the 
analytic details. 

Notice that we can write 

for certain real numbers a 7 independent of x. The hypothesis LI implies that 1/2 
can never be a zero of L(s, x), which is why we do not have to consider 7 = 0. 
Also, under LI, any vector of the form 

{sin(7i log x + a 7l ) , . . . , sin(7 fc log x + a lk ) } 

becomes uniformly distributed over the fc-dimensional torus as x tends to infinity. 
(It is the presence of log 2 in this statement that requires us to define the 5 q - ai ,...,a r 
as logarithmic densities rather than natural densities.) It can be shown that 
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E(x, x) has a limiting (logarithmic) distribution as x tends to infinity; moreover, 
this distribution can also be described in terms of random variables. We now give 
this description. 

For any positive number (3, let Zp be a random variable that is uniformly 
distributed on the unit circle in the complex plane; we make the convention that 
Z_p = Zp. We stipulate that any collection {Z^} with all of the $ distinct 
and positive is an independent collection of random variables. For any Dirichlet 
character \ (mod q), define the random variable 

*(*) = £ Z 



7 y/l/A + 7 2 ' 

where again the sum is indexed by the imaginary parts of the nontrivial zeros of 
L(s,x)- We can also write 

X(X)=2J2 p (12) 

7 >0 yfl/4 + 7 2 

(since L(|, x) 7^ 0), where the X 1 = Re Z 7 are independent random variables each 
distributed on [—1,1] with the sine distribution. One can then show (see |8j). 
assuming GRH and LI, that the limiting distribution of E(x, x) is identical to the 
distribution of the random variable X(x)- Similarly, it follows from equation ([71) 
that the limiting distribution of E(x; q, a) is the same as the distribution of the 
random variable 

-c(q,a)+ £ X( X ). 

X (mod q) 
Xt^XO 

(One might expect the summand to be something like x(a)X(x) rather than 
simply X(x), but the coefficient x(a) disappears early in the argument because 
x(a)Z 7 is the same random variable as Z 7 itself.) We remark that the hypothesis 
LI implies that the various X(x) are mutually independent random variables. 

Let us examine these normalized error terms and random variables more con- 
cretely for the moduli q = 8 and q = 12. For a fundamental discriminant D, 
let Xd( u ) — (— ) using Kronecker's extension of the Legendre symbol. Then the 
three nonprincipal characters (mod 8) are y_8, X-4> an d Xs, while the three non- 
principal characters (mod 12) are y_4, X-3> an d Xi2- (Table [l| explicitly lists the 
values taken by these characters. We shall abuse notation a bit and also denote 
by Xd a character modulo 8 or 12 that is induced by the primitive character xd, 
whose conductor is \D\.) 



X X(3) x(5) X(7) 

X-8 1 -1 -1 

X-4 -1 1 -1 

Xs -1 -1 1 



X X(5) X(7) x(ll) 

X-4 1 "I "I 

X-3 "I 1 "I 

X12 "I "I 1 



Table 1. Values of the nonprincipal characters (mod 8) and (mod 12) 
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Now, if we want to consider how often 7i(x; 8, 3) exceeds 7r(x; 8, 1), for example, 
we can look at the limiting distribution of the normalized difference E(x; 8, 3) — 
E(x; 8, 1) and ask what proportion of that distribution lies above 0. From the 
explicit formula (|?p, we have 

E(x;8,3)-E(x;8,l)=4 + £ (1 - x(3))£(x, X ) + 0(^— 

X (mod q) ° 
Xt^XO 

= 4 + 2E(x-4) + 2E( X 8) + 0(-±- 

HOgX* 

Thus E(x; 8, 3) — E(x; 8, 1) has the same limiting distribution as the random 
variable 4 + 2X(x_ 4 ) + 2X(%g), where X(x) is as in equation (p~2|) . In particular, 
the density ds-,3,1 equals the mass given to the interval (0, oo) by this limiting 
distribution, or in other words simply Pr(4 + 2X(x-i) + 2X(x§) > 0). In fact, if 
we define 

X 8;3 ,i = 4 + 2X(x-4) + 2X(x 8 ) 

X 8;5il = 4 + 2X(x-8) + 2X(x 8 ) (13) 
X 8;7il = 4 + 2X(x- 8 ) + 2X(x-4) 

and 

X 12;5>1 = 4 + 2X(x- 3 ) + 2X(xi 2 ) 

X 12;7 ,i = 4 + 2X{ X -a) + 2X( X u) (14) 

X 12;11 ,i = 4 + 2X( X -4) + 2X(x- 3 ), 

then in each case, the distribution of the random variable X q;ai \ is the same 
as the limiting distribution of the difference E(x;q,a) — E(x;q, 1), and S q - aj i = 
Pr(X, ;Qil > 0). 

If we have several random variables, each with mean 4 and symmetric about 
that mean, which ones will take positive values most often? If the random vari- 
ables have roughly the same shape, then we expect the ones with the smallest 
variance to stay above the most. So let's compute the variances of the random 
variables X g;aj i. 

Any variance of the Var(cX 7 ) with c > is simply |c 2 , and the various Xy are 
independent; so if we define V(x) = Var(X(x)), we see from the definition fljjD 
of X(x) that 

VW = SvJTf' (15) 

We know that the larger the conductor of a character is, the more numerous and 
low-lying (close to the real axis) the zeros of the corresponding L(s, x) will be. 
In fact, the order of magnitude of the sum in equation ([15]) is known to be the 
logarithm of the conductor of Xi & t least on GRH; one can see this from the 
formula (see Davenport |3], p. 83]) 

V( X ) = log i - 70 - (1 + x(-l)) log 2 + 2 Re ( 16 ) 
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X 


v(x) 


X-3 


0.11323 


X-4 


0.15557 


X8 


0.23543 


X-s 


0.31607 


Xl2 


0.33017 



Table 2. Values of V(x) = E 7>0 



when x is a primitive character (mod q), where 70 is Euler's constant. Therefore, 
V(x) will be larger when the conductor of x is large. In particular, we should 
expect 

V{xi2) > V(x-s) > V(xs) > V{x-i) > V( X -s), (17) 

and the numerical computation of the variances verifies these expectations (see 
Table |). 

Why do we say that we expect V(x-s) > V(Xs)) when the two characters 
have the same conductor? There is a secondary phenomenon, namely that the 
zeros of L-functions corresponding to even characters tend to be not as low-lying 
as those of L-functions corresponding to odd characters (the trivial zero at s = 
of an L-functions associated to an even character seems to have a repelling effect 
on the nontrivial zeros). Indeed, the term —(1 + x(— 1)) log 2 in the formula flTB]) 
for V(x), which vanishes for odd characters x-, slightly lowers the value of V(x) 
for even characters x- 

Of course this observation would be spurious if the behavior of the real part 
of L'(l, x)/L(l, x) were much different for odd and even characters. While there 
is no reason to suspect that this should be the case, it seems hard to say anything 
substantial about the distribution of these values (this is a subject that warrants 
further investigation). Nevertheless, a look at lists of the first several zeros of 
Dirichlet L-functions with small conductor does confirm that the zeros of L(s, x) 
are lower-lying when x is odd than when x is even. 

Returning to equations ([13]) and (|i~4"D, we can easily compute the variance of 
the random variables X q . a ^ (again since the various X(x) are independent by LI). 
For example. 

Var(X 12;5 ,i) = 4V(x- 3 ) + 4 Var( X i 2 ) = 4W 12 - W( X -*), 
where we define 

w q = zZ v (x)- 

X (mod q) 

In general, we obtain 

Var(X 8;3il )=4^ 8 -4V(Y_8) 
Var(X 8;5il ) =4Ws-W(x-4) 
Vax{X 8 . 7>1 )=4W 8 -W( Xs ) 
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and 

Var(X 12;5 , 1 )=4^ 12 -4V r ( X -4) 
Var(X 12;7 , 1 )=4W 12 -4V r (x- 3 ) 
Var(X 12;11 ,i) =4W 12 -W(xn). 

Given the relative sizes of the V( X ) as listed in equation (|T7|), we see that 

Var(X 8;5>1 ) > Var(X 8;7il ) > Var(X 8;3 ,i) 
Var(X 12 ; 7 ,i) > Var(X 12;5>1 ) > Var(X 12;llil ). 

This in turn suggests that 

Pr(X 8;3 ,i > 0) > Pr(X 8;7>1 > 0) > Pr(X 8;5il > 0) 
Pr(X 12;11 ,i > 0) > Pr(X 12;5 ,i > 0) > Pr(X 12;7jl > 0), 

or equivalently 

<58;3,1 > <W,1 > ^8;5,1 and 5l2;ll,l > #12;5,1 > <W,1- 

This is exactly what is observed in equation fliCf). 

We emphasize that although the justification ventured into the analytic realm, 
in the end these predictions of the relative sizes of the S q;at i depended upon only 
algebraic properties of the various residue classes a modulo q. To each residue 
class a was associated a particular character based on the values of the characters 
at a, and the conductor of this character is what correlated with the size of S q - a ^. 
This is in the same spirit as Chebyshev's bias: the sign of S q . a ^ — 1/2 was shown 
by Rubinstein and Sarnak |8| to be determined by whether the residues a and b 
are squares in the multiplicative group modulo q. 

A similar sort of analysis can also explain the relative sizes of the densities 
listed in equation (|TTD , for which it is convenient to define a slightly differently 
normalized error term for 7r(x;q,a). When q = 8 or q = 12 and a is one of 
the three nonsquare residue classes (mod q), we define E(x;q,a) = E(x;q,a) + 
E(x;q,l); again, investigating the relative sizes of the various 7r(x; g, a) is the 
same as investigating the relative sizes of the E(x; q, a). For example, 

E(x; 12, 5) = E(x; 12, 5) + E(x; 12, 1) 

= 2- £ ( x (5) + l)E(x, X )+0(-±-) 

X (mod 12) iU & X 

x¥=xo 

= 2-2E(x, X -a)+0(-^—), 

which has the same limiting distribution as the random variable 2 + 2X(x-4). In 
fact, if we define the random variables 

X 8;3 = 2 + 2X( X -s) X 12 , 5 = 2 + 2X{ X -a) 

X s . 5 = 2 + 2X( X -4) and X 12;7 = 2 + 2X( X - 3 ) 
X 8 .j = 2 + 2X( X8 ) X 12 , u = 2 + 2X(xn), 
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then in each case the distribution of E(x; q, a) is the same as that of the random 
variable X q . a . Note also that the three random variables X 8 . 3 , X 8;5 , and X 8;7 are 
mutually independent due to the hypothesis LI, and the same is true of X 12;5 , 
Xi2-j, and Xl2;ii. 

If we have three independent random variables each with the same mean, 
which one would we expect to take values between those of the other two most 
frequently? Our intuition tells us that the random variable with smallest variance 
will prefer to stay in the middle, while the one with largest variance will more 
frequently be in first or last place. We easily see that the variances for these 
random variables are 

Var(X 8;3 ) = 4V( X _ 8 ) Var(X 12;5 ) = W{ X -i) 

Var(X 8;5 ) = W( X -a) and Var(X 12;7 ) = AV( X - 3 ) (18) 

Var(X 8;7 ) = AV(xs) Var(X 12;11 ) = 4V( Xl2 ). 

Once again, our knowledge ( |17D of the relative sizes of the quantities V(x) tells 
us that 

Var(X 8;3 ) > Var(X 8;7 ) > Var(X 8;5 ) 
Var(X 12;11 ) > Var(X 12;5 ) > Var(X 12;7 ). 

Therefore, we expect that of the three prime counting functions 7r(x;8,a) with 
a £ {3,5,7}, the function 7r(a;;8,3) spends more time in first and last place 
than the other two while the function n(x; 8, 5) spends the most time in second 
place; similarly, of the prime counting functions 7r(x;12,a) with a £ {5,7,11}, 
the function tt(x; 12, 11) spends more time in first and last place than the other 
two while the function 7r(x; 12, 7) spends the most time in second place. All of 
these predictions match the observed densities in equation ([TT|) . 

We emphasize how important it was that the trios of random variables 

{As ;3 , X 8; 5, X 8;7 } and {Xi 2; 5, Xi 2;7 , Xi 2; n} 

were independent, so that we could draw conclusions about their relative posi- 
tions in the three-way race based solely on their individual variances. We could 
certainly have normalized the error terms in an artificial way so that one of the 
resulting random variables in a trio equaled zero, for example! But then the other 
two random variables would not have been independent, and the correlation be- 
tween them would have ruined any chance at such a straightforward analysis. 

We plan to generalize these observations and arguments, as much as possible, 
to general moduli q in a future paper. The situation regarding densities of the 
form 5g ;a ,i for nonsquares a (mod q) will be complicated by the greater complexity 
of the multiplicative groups to higher moduli, but we believe that the analysis for 
the relative sizes of these two-way densities can be successfully generalized. At 
the moment, however, the analysis of the three-way races above relied on the fact 
that for every character x (mod q), at least two of the three values xi a i) were 
equal; this is a property that only special triples {ax, a 2 , a 3 } (mod q) can possess. 
While these special cases of three-way races to higher moduli can be treated as 
above, a new idea will be needed to generalize further. 
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3. Densities and equalities 

Since this is a conference proceedings, it seems appropriate to record here 
some comments made at the Millennial Conference regarding the subject of this 
paper. First, Gerald Tenenbaum mentioned that the density 

,1 f dt 

Oq;a u ...,ar = OW. - / — , (19) 

x^oo logX J t 

2<t<x 
ir(t;q,ai)>--->n(t;q,a r ) 

as defined in equation @ and the preceding lines, is not the only possible quantity 
to study when measuring the biases of the various orderings of the prime counting 
functions ir(x;q,ai). Indeed, he noted for example that for any real number 
k > — 1, the related density 

w k+1 f (logt) k dt 

°«ai,...,ar ^oo (logx)*+l J t 

ir(t;q,ai)>--->Tr(t;q,a r ) 

will also exist in this context. 

Surprisingly, it turns out that these densities 6f® ^ are independent of the 
parameter k > — 1. One can prove this by hand, using the fact that for any func- 
tion of the form f(x) = ax@ with a and (3 positive, the (natural) density of those 
positive real numbers x for which the fractional part of f(x) lies in an interval 
[7, 77] C [0, 1] is exactly 77—7. In fact, the lack of dependence on the parameter k is 
a consequence of a more general result of Lau |J regarding distributions of error 
terms of number-theoretic functions. So although the particular definition ( |T9"D 
is not canonical, the density values themselves seems to be natural quantities to 
consider. 

On another topic, Rubinstein and Sarnak ]8J showed that under the assump- 
tions GRH and LI, the density of the set of positive real numbers x such that 
7r(x; g, a) = 7r(x;q,b) equals zero (in fact they prove something rather stronger). 
Carl Pomerance asked whether one could prove this particular statement uncon- 
ditionally. This is an excellent question, and while it certainly might be possible 
to establish unconditionally that 7r(x; q, a) and tt(x; q, b) are "almost never" equal, 
this author does not know how to do so. 

Since we know (conditionally) that the equality n(x;q,a) = 7r(x;q,b) has 
arbitrarily large solutions, one can ask whether a system of equalities of the form 
7r(x; q, di) = • • • = tt(x; q, a r ) also has arbitrarily large solutions. A conjecture 
of the author (see |5[), resulting from an analogy to random walks on lattices, 
is that the answer is yes when r = 3 but no when r > 4. One can refine this 
conjecture in the following way: if {cii, . . . , a r } are mutually incongruent reduced 
residues (mod q), then we believe that 



h SiS f kfa -ir{x;q,aj)\) = I ' ' (20) 

x— too \l<i<i<r ' 00. it r > 4. 
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Of course this raises the issue as to what function f(x) should be chosen so 
that for r > 4, the quantity 

liminf maxi<i<j< r |tt(x; q, aj) - 7r(x; q, aj)\ 

would be finite and nonzero (and whether the order of magnitude of this function 
f(x) depends on r). It follows directly from the fact that the difference E(x; q, a) — 
E(x;q,b) possesses a limiting distribution that for any integers q, r > 2, there 
exists some constant C = C(q, r) > such that the density of those positive real 
numbers x with 

\E(x;q,a) - E(x;q,b)\ = \n(x\q,a) - n(x; q,b)\ 
<p(q) ^fx/\ogx 

is less than 1/r 2 for any pair a,b of distinct reduced residues modulo q. Thus 
more than half of the time we must have 

maxi< Kj < r \tt(x; q, aj) - n(x; q, a/) | < 

y/x/\ogX 

since there are only r(r — l)/2 terms in the maximum. 

This argument shows that the expression in equation (j2lD is finite when 
f(x) = ^/x/logx. However, nothing immediately ensures that the expression 
is nonzero, in which case the proper choice of f(x) would be somewhat smaller 
than yfxj logx. In any case, we should begin by trying to establish equation ( |20D 
in the first place, perhaps even in an extreme case such as r = <j)(q). 
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